Distributed PCAProblem Formulation Principal Component Analysis (PCA) is an unsupervised, non-parametric statistical technique primarily used for dimensionality reduction in machine learning. In conventional PCA, the goal is to recover a low rank matrix from measurements yik=⟨X∗,Aik⟩, i=1,…,m, and k=1,…,q. Here X∗ is the ground truth matrix and X∗=U∗B∗, where U∗ and B∗ are low dimensional matrices of size n×r and r×q respectively. Also, matrices Aik's of size n×q are the design matrices. A main issue with the conventional PCA is that recovering matrix X∗ is a centralized algorithm. Thus, it faces memory and computation complexity problems. To overcome these issues, we present a distributed version of the problem where yik=⟨x∗k,aik⟩. The design vectors, ai,k's, are mutually independent. Still the goal is to recover U∗ and B∗, and hence X∗. In this system there is a master server and q computing nodes. The idea is that once we have a good estimate of U∗, say ˆU, then k-th node can use it to get an estimate of ˆbk. Next, each node will send ˆbk to the master server and the server will update ˆU. The process iterates untill a stop criteria hits. More preciesly, at each iteration the server sends yik and ˆU′aik, for i=1,…,m, to the k-th node and the nodes estimate ˆbk. Next, the nodes send ˆbk's to the server and the server use it to update ˆU. Thus, we introduced a decentralized algorithm for PCA that is efficient both in memory and computation point of views. We present a Gradient Based algorithm and showed that if mq≥C(n+q)r2, then with high probability the algorithm converges. Note that since mq≥(n+q)r thus our algorithm needs only r times more measurements than the minimum required. Also, each node might enter or leave the party at anytime. For more information and algorithm please refer to the following paper and please cite the paper when you use its MATLAB code. S.Nayer and N.Vaswani, "Fast and Sample-Efficient Federated Low Rank Matrix Recovery from Column-wise Linear and
Quadratic Projections", .
|